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Magnetic resonance imaging (MRI) can be used to detect lesions in the brains of multiple sclerosis (MS) 
patients and is essential for diagnosing the disease and monitoring its progression. In practice, lesion load 
is often quantified by either manual or semi-automated segmentation of MRI, which is time-consuming, cost- 
ly, and associated with large inter- and intra-observer variability. We propose OASIS is Automated Statistical 
Inference for Segmentation (OASIS), an automated statistical method for segmenting MS lesions in MRI stud- 
ies. We use logistic regression models incorporating multiple MRI modalities to estimate voxel-level proba- 
bilities of lesion presence. Intensity-normalized Tl -weighted, T2-weighted, fluid-attenuated inversion 
recovery and proton density volumes from 131 MRI studies (98 MS subjects, 33 healthy subjects) with man- 
ual lesion segmentations were used to train and validate our model. Within this set, OASIS detected lesions 
with a partial area under the receiver operating characteristic curve for clinically relevant false positive 
rates of 1% and below of 0.59% (95% CI; [0.50%, 0.67%]) at the voxel level. An experienced MS neuroradiologist 
compared these segmentations to those produced by LesionTOADS, an image segmentation software that 
provides segmentation of both lesions and normal brain structures. For lesions, OASIS out-performed 
LesionTOADS in 74% (95% CI: [65%, 82%]) of cases for the 98 MS subjects. 

To further validate the method, we applied OASIS to 169 MRI studies acquired at a separate center. The neu- 
roradiologist again compared the OASIS segmentations to those from LesionTOADS. For lesions, OASIS ranked 
higher than LesionTOADS in 77% (95% CI: [71%, 83%]) of cases. For a randomly selected subset of 50 of these 
studies, one additional radiologist and one neurologist also scored the images. Within this set, the neurora- 
diologist ranked OASIS higher than LesionTOADS in 76% (95% CI: [64%, 88%]) of cases, the neurologist 66% 
(95% CI: [52%, 78%]) and the radiologist 52% (95% CI: [38%, 66%]). 

OASIS obtains the estimated probability for each voxel to be part of a lesion by weighting each imaging mo- 
dality with coefficient weights. These coefficients are explicit, obtained using standard model fitting tech- 
niques, and can be reused in other imaging studies. This fully automated method allows sensitive and 
specific detection of lesion presence and may be rapidly applied to large collections of images. 

© 2013 The Authors. Published by Elsevier Inc. All rights reserved. 
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1. Introduction 

Multiple sclerosis (MS) is an inflammatory disease of the brain and 
spinal cord characterized by demyelinating lesions that are most easily 
identified, at least on magnetic resonance imaging (MRI) studies, in the 
white matter of the brain (Sahraian and Radue, 2007). Quantitative 
analyses of MRI, such as the number and volume of lesions, are essential 
for diagnosing the disease and monitoring its progression (Rovira and 
Leon, 2008; Rovira et al., 2009). MRI measures are also a common 
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primary endpoint in phase II immunomodulatory drug therapy 
trials (Sormani et aL, 2009). In these trials, either manual or semi- 
automated segmentations are used to compute the total number of le- 
sions and the total lesion volume (Llado et al., 2011 ). Manual delinea- 
tion is challenging as three-dimensional information from several MRI 
modalities must be integrated (Llado et al., 2011). Manual assessment 
of MRI is also prone to large inter- and intra-observer variability 
(Simon et al., 2006). While semi-automated methods have been found 
to decrease inter- and intra-rater variability, they still require manual 
reader input and are time consuming (Garcia-Lorenzo et al., 2013). 
Therefore a sensitive and specific automated method to detect lesions 
in the brain is essential for the analysis of studies with a high numbers 
of MS patients. 

Llado et al. (2011) provides a comprehensive review of currently 
available automated cross-sectional MS lesion segmentation methods, 
or methods used to identify lesions from a single MRI study. We divide 
these methods into four categories: supervised classifier with an atlas, 
supervised classifier with no atlas, unsupervised classifier with an 
atlas, and unsupervised classifier with no atlas. We focus on supervised 
methods without atlases, as the method we propose is in this category. 
Supervised methods without atlases train on manually segmented im- 
ages annotated by experts and use image intensities of MRI to classify 
lesions (Llado et al., 2011). Supervised classification algorithms are 
applied to the volumes: artificial neural networks (Goldberg-Zimring 
et al., 1998), spatial clustering (Alfano et al., 2000), k-nearest neighbors 
(Anbeek et al., 2004, 2005, 2008), Parzen window (Sajja et al., 2006), 
Parzen window and morphological grayscale reconstruction (Datta 
et al., 2006), Bayes (Scully et al., 2008), AdaBoost (Morra et al., 2008), 
simulated annealing and Markov random fields (Subbanna et al., 
2009), and graph cuts (Lecoeur et al., 2009). All of the aforementioned 
methods except Anbeek et al. (2008) use multi-modality MRI informa- 
tion to classify lesions. The most widely-used feature across all seg- 
mentation methods is voxel intensity, which derives strength from a 
multi-modality approach (Llado et al., 2011). 

The method we propose uses a logistic regression model to assign 
voxel-level probabilities of lesion presence in structural MRI of pa- 
tients with MS. Logistic regression models have been used for seg- 
mentation of brain tissues and pathology in MRI (Bullmore et al„ 
1999; Dinh et al., 2012; Lee et al., 2005). For applications to MS, logis- 
tic regression has been used for detection of gadolinium enhancing 
lesions (Karimaghaloo et al., 2012), prediction of gadolinium enhanc- 
ing lesions without administering contrast agents (Shinohara et al., 
2012), and for segmentation of new and enlarging MS lesions 
(Sweeney et al., 2013). To our knowledge logistic regression has not 
been used in cross-sectional segmentation of MS lesions in structural 
MRI. 

One difficulty in automated segmentation of MRI is due to variable 
imaging acquisition parameters (Llado et al., 2011). All of the seg- 
mentation methods reviewed in Llado et al. (2011) have tuning 
parameters that are adjusted to a particular data set and may not gen- 
eralize to a new data set with different acquisition parameters. These 
parameters are not informed by the data and therefore must be tuned 
empirically, often with little to no interpretability of the parameter. 
Application to a new data set may require several iterations of seg- 
mentations to adjust the tuning parameters to values that produce ac- 
ceptable segmentations. A method in which the tuning parameters 
are informed by the data and for which adjustments are intuitive 
and simple would therefore be valuable. 

A second difficulty in intensity-based segmentation is that MRI 
data are acquired in arbitrary units; units can vary widely between 
and within imaging centers. These variations are attributed to scan- 
ner hardware, interactions between hardware and patients, and var- 
iations in acquisition parameters (Simmons et al., 1994). Therefore, 
proper intensity normalization is essential in developing a generaliz- 
able segmentation method. Many of the segmentation methods 
use intensity-normalized volumes (Llado et al., 2011), but these 



methods do not demonstrate the generalizability of the normalization 
procedure to changes in imaging acquisition parameters and imaging 
centers. In Garcia-Lorenzo et al. (2013) the authors performed a 
PubMed and Google Scholar search for MS lesion segmentation pa- 
pers. Of the 47 papers that met their search criteria, only 13 of these 
papers used multicenter data for validation, and the largest database 
used for validation consisted of 41 subjects. To show generalizability, 
methods must be validated on multicenter data with many subjects. 

A third difficulty is intensity inhomogeneity, the slow spatial in- 
tensity variations of the same tissue within an MRI volume. Inhomo- 
geneity can significantly reduce the accuracy of image segmentation 
(Hou, 2006), and therefore some form of spatial normalization is nec- 
essary for accurate lesion segmentation. Most lesion segmentation 
methods assume that these inhomogeneities have been corrected 
during image preprocessing, but we have found strong spatial pat- 
terns within tissue type even after the N3 inhomogeneity correction 
algorithm (Sled et al., 1998) is applied. 

To address these and related problems, we propose OASIS is Auto- 
mated Statistical Inference for Segmentation (OASIS), a fully automat- 
ed, generalizable, and novel statistical method for cross-sectional MS 
lesion segmentation. Using intensity information from multiple mo- 
dalities of MRI, a logistic regression model assigns voxel-level proba- 
bilities of lesion presence. After training on manual segmentations, 
the OASIS model produces interpretable results in the form of regres- 
sion coefficients that can be applied to imaging studies quickly and 
easily. OASIS uses intensity-normalized brain MRI volumes, enabling 
the model to generalize to changes in scanner and acquisition se- 
quence. OASIS also adjusts for intensity inhomogeneities that prepro- 
cessing bias field correction procedures do not remove, using 
smoothed volumes. This allows for more accurate segmentation of 
brain areas that are highly distorted by inhomogeneities, such as 
the cerebellum. One of the most practical properties of OASIS is that 
the method is fully transparent, easy to explain and implement, and 
simple to modify for new data sets. 

To illustrate the generalizability of OASIS to changes in imaging 
acquisition parameters, we evaluated the performance of the algo- 
rithm on a total of 300 MRI studies from two separate imaging centers 
with varying acquisition parameters. This is a crucial criterion for 
assessing the generalizability and utility of the method. 

2. Materials and methods 

In this section we introduce OASIS, a method inspired by Subtrac- 
tion Based Inference for Modeling and Estimation (SuBLIME), an au- 
tomated method for the longitudinal segmentation of incident and 
enlarging MS lesions (Sweeney et al., 2013). Before the OASIS logistic 
regression model is fit, a brain tissue mask is created, all MRI volumes 
are intensity normalized, and smoothed volumes are created to cap- 
ture local spatial information and adjust for remaining field inhomo- 
geneities. The OASIS method involves two iterations of model fitting: 
the first to perform an initial lesion segmentation and the second to 
use this initial lesion segmentation to remove lesions, which can dis- 
tort the smoothed volumes. After the final model is fit, the regression 
coefficients are applied to produce three dimensional maps of voxel- 
level probabilities of lesion presence. 

We evaluate the performance of OASIS on MRI volumes of the 
brain acquired with various acquisition protocols. We use data sets 
from two different imaging centers for validation, which we refer to 
as Validation Set 1 and Validation Set 2. Validation Set 1 has manual 
lesion segmentations. We trained the OASIS method on a subset of 
the studies in this dataset, and tested on the remaining studies. An ex- 
pert evaluated the segmentations from Validation Set 1. Validation 
Set 2 is used to demonstrate generalizability to changes in image ac- 
quisition parameters. We applied the coefficients from the model 
trained on Validation Set 1 to the studies in Validation Set 2, and ex- 
perts evaluated the OASIS lesion segmentations. 



404 



EM. Sweeney et al I Neurolmage: Clinical 2 (2013) 402-413 



2.1. Study population 

Validation Set 1 contains a total of 131 MRI studies from 131 sub- 
jects. Of these studies, 98 are from patients with MS and 33 are 
healthy volunteer scans. Of the 98 patients with MS, the median age 
is 44 years (IQR: [33, 54]), 72 are female (26 male), and the median 
EDSS is 3.5 (1QR:[2, 6]). The median age of the healthy volunteers is 
34 (IQR: [28, 42]) and 19 are female (14 male). 

Validation Set 2 contains a total of 169 MRI studies from 149 sub- 
jects. Twenty subjects in Validation Set 2 have baseline and follow-up 
scans. The mean time between baseline and follow-up for these 20 
subjects is 132 days (IQR: [51, 182]). The subjects in the validation 
set are a mixture of healthy volunteers and patients: 110 of the pa- 
tients have MS, 38 have other neurological diseases, and one is a 
healthy volunteer. The median age of the MS patients is 42 (IQR: 
[33,50]); 54 are female (56 male); 68 have relapsing remitting MS, 
31 have primary progressive MS, and 11 have secondary progressive 
MS. The median age of the patients with other neurological diseases 
is 41 years, (IQR: [35, 51]) and 8 are female (30 male). The healthy 
volunteer is a 28 year old female. 

2.2. Experimental methods 

Tl -weighted, T2- weighted, fluid-attenuated inversion recovery 
(FLAIR) and proton density (PD) volumes were acquired for all subjects 
at each study, and all imaging protocols were approved by local institu- 
tional review boards. For Validation Set 1, 3DT1-MPRAGE images 
(repetition time (TR) = 10 ms; echo time (TE) = 6 ms; flip angle 
(FA) a = 8°; inversion time (TI) = 835 ms, resolution = 1.1 mm x 
1.1 mm x 1.1 mm), 2D T2-weighted pre-contrast FLAIR images 
(TR = 11,000 ms; TE = 68 ms; TI = 2800 ms; in-plane resolution = 
0.83 mm x 0.83 mm; slice thickness = 2.2 mm), T2-weighted and PD 
images (TR = 4200 ms; TE = 12/80 ms; resolution = 0.83 mm x 
0.83 mm x 2.2 mm) were acquired on a 3 T MRI scanner (Philips 
Medical Systems, Best, The Netherlands). 

For Validation Set 2, the 3D T2-weighted post-contrast FLAIR was 
acquired using a variable flip angle sequence, the 2D PD and 
T2-weighted volumes using a dual-echo fast-spin-echo sequence, 
and the 3D Tl -weighted volume using an inversion-prepared fast 
spoiled gradient-echo sequence. These studies were acquired on a 
single 3 T MRI scanner (Signa Excite HDxt; GE Healthcare, Milwaukee, 
Wisconsin). Table 1 contains the ranges for the Validation Set 2 scan- 
ning parameters. 

2.3. Image preprocessing 

Before building our statistical model for the lesion segmentation, 
we preprocessed the images from Validation Set 1 and Validation 
Set 2 using the tools provided in Medical Image Processing Analysis 
and Visualization (MIPAV) (McAuliffe et al., 2001), TOADS-CRUISE 
(http://www.nitrc.org/projects/toads-cruise/), and Java Image Sci- 
ence Toolkit (JIST) (Lucas et al., 2010) software packages. We first rig- 
idly aligned the Tl -weighted image of each subject into the Montreal 
Neurological Institute (MNI) standard space (voxel resolutionlmm 3 ). 
We then registered the FLAIR, PD, and T2-weighted images of each 
subject to the aligned Tl -weighted images. We also applied the N3 in- 
homogeneity correction algorithm (Sled et al., 1998) to all images and 



Table 1 

Ranges for Validation Set 2 scanning parameters. 





FA (degrees) 


TR (ms) 


TE (ms) 


TI (ms) 


FLAIR 


90 


(4800, 8802) 


(124.3, 151.4) 


(1481,2200) 


T2-weighted 


90 


5317 


(116.2, 124.2) 


NA 


PD 


90 


5317 


(16.0, 23.7) 


NA 


Tl -weighted 


(6, 13) 


(8.7, 9.1) 


(3.2, 3.6) 


(450, 725) 



removed extracerebral voxels using SPECTRE, a skull-stripping proce- 
dure (Carass et al., 2011). 

2.4. Statistical modeling and spatial smoothing 

We performed all statistical modeling in the R environment (ver- 
sion 2.12.0, R Foundation for Statistical Computing, Vienna, Austria) 
with the packages AnalyzeFMRI (Bordier et al., 2009), biglm 
(Lumley, 2009), ff (Adler et al., 2011), and ROCR (Sing et al., 2009). 
We used the FSL tool fslmaths (http://www.fmrib.ox.ac.uk/fsl) for 
the three dimensional spatial smoothing of the volumes. 

2.5. Brain tissue mask 

The first step in OASIS is to create a mask of the brain that excludes 
cerebrospinal fluid (CSF). CSF is excluded because it disrupts the cap- 
ture of the inhomogeneity field and distorts the representation of the 
local cerebral features when creating smoothed volumes. To make 
this mask, we used the extracerebral voxel removal mask described 
in the Image Preprocessing Section and excluded voxels in the mask 
that appear hypointense in the FLAIR volume. Because CSF is 
hypointense in the FLAIR, we empirically found that excluding voxels 
falling below the bottom 15th percentile of FLAIR intensities over the 
extracerebral voxel removal mask removes CSF outside of the brain 
and in the ventricles. We refer to this mask as the brain tissue mask. 
Fig. 2B1 shows a slice of the brain tissue mask for a particular subject 
for illustration. 

2.6. Intensity normalization 

We used intensities from the FLAIR, PD, T2 -weighted, and Tl- 
weighted volumes to identify the presence of MS lesions. We denote 
the observed intensity of voxel v, for subject i by: 

M°(v),M = FLAIR, PD, 12, Tl 

where M indicates the imaging sequence. 

MRI volumes are acquired in arbitrary units. Analyzing images 
across subjects and imaging centers requires that images be normal- 
ized so that voxel intensities have common interpretations. For nor- 
malization, we adapt the normalization method from Shinohara et 
al. (2011) to normalization with respect to the brain tissue mask. 
The normalized intensity of voxel v, for subject i is denoted by: 

Mf(v ) = JV %^ 

°i,M 

where /iyn and u^ M are the mean and standard deviation of the ob- 
served voxel intensities in the brain tissue mask of subject i, from se- 
quence M. The normalized voxel intensities are standard scores of the 
brain tissue mask. Fig. 1A shows a slice of the normalized images from 
all four modalities from a single subject with MS: FLAIR, T2-weighted, 
PD, and Tl -weighted. 

2.7. Smoothed volumes 

To account for intensity inhomogeneities that remain after initial 
inhomogeneity correction, we use a sequence of multiresolution 
smoothed volumes, obtained using different levels of smoothing. 
The smoothed volumes are created by three dimensional smoothing 
of the normalized volume from each modality over the brain tissue 
mask. A Gaussian smoother with relatively large kernel window size 
is used to smooth over the features in the brain and capture the pat- 
tern of the remaining inhomogeneity. 

For subject i and imaging modality M, let k be the size of the kernel 
window. Then the intensity in voxel v of the smoothed volume of 
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Fig. 1. A. Axial slice from different modalities of intensity normalized brain MRI of a single subject: Al. FLAIR image. A2.T2-weighted image. A3. PD image. A4.T1 -weighted image. B. Brain 
tissue mask of an axial slice of the brain. C. Axial slice of select voxels for OASIS modeling. D. Manual lesion segmentation of an axial slice of the brain. E. Axial slice of brain tissue 
mask with dilated lesion mask made at a false positive rate of 1% removed. F. Axial slice of the smoothed probability map with intensity scale. G. Binary segmentation of the 
probability map from the OASIS model at false positive rate of .005 overlaid on the FLAIR image. 



imaging modality M is expressed as QMy(v, k). The smoothed volumes 
are used in the OASIS model to incorporate spatial information and to 
account for inhomogeneities in the brain that persist after N3 correc- 
tion. For OASIS we use smoothed volumes as covariates with kernel 
window sizes of 10 and 20 voxels, which were found empirically on 
Validation Set 1 to work well. Fig. 2 shows the smoothed volumes 
for both kernel window sizes of 10 and 20 from each modality. The 
kernel window size of 20 smooths over the anatomical features al- 
most completely, while the kernel window size of 10 still preserves 
some of these features, such as the hyperintensities of the gray matter 
in the FLAIR, T2-weighted, and PD volumes and hypointensities of the 
gray matter in the Tl -weighted volume. 

2.8. OASIS is Automated Statistical Inference for Segmentation 

In this section we introduce the OASIS model. OASIS uses logistic 
regression to model the probability that a voxel is part of a lesion. 
We choose logistic regression because it is extremely simple and 



easy to interpret. We model lesions at the voxel level using FLAIR, 
PD, T2 -weighted, and Tl -weighted intensities as well as the intensi- 
ties from the smoothed volumes of each modality with kernel win- 
dow sizes of 10 and 20 voxels. The model must be trained on a gold 
standard measure of lesion presence. Fig. 1 D is an example of manual 
lesion segmentation, which is an appropriate gold standard measure 
for the OASIS model. The result of our model is a collection of coeffi- 
cients that can be used to create three-dimensional maps of the prob- 
abilities of lesion presence. OASIS obtains the estimated logit of the 
probability of each voxel being part of a lesion by weighting these 
12 images (the four imaging modalities and smoothed volumes for 
each modality) with the coefficients. 

The first step of the modeling procedure is to select candidate voxels 
to minimize false positives and computation time. Lesions appear as 
hyperintensities in the FLAIR volume. The brain tissue mask was applied 
to the FLAIR volume, and the 85th percentile and above of voxels in the 
brain tissue mask were selected as candidate voxels for lesion presence. 
In Validation Set 1, there were a total of 1,093,394 lesion voxels (a 
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Fig. 2. Axial slice from a single subject of the smoothed volumes from all modalities. Row one contains the smoothed volumes with kernel window size of 1 0 and row two contains 
the smoothed volumes with kernel window size of 20. Column A contains the FLAIR images. B contains the T2-weighted images, C contains the PD images and D contains the 
Tl-weighted images. To link the figure with the notation used in this paper: Al. QFLAlRf (v, 10); A2. gFLAlRf (v,20); Bl. 5T2f(v,10); B2. 5T2f(v,20); CI. SPDf(v,10); C2. 
SPDf(v,20); Dl. gTlf(v,10); D2. STlf(v,20); and E. Scale of intensities in the smoothed volumes. 



volume of 1093 cm 3 ). The voxel selection procedure excluded 64,556 
(6%) of these voxels, but lowered the searchable area to 15% of the orig- 
inal size. This procedure also decreases the number of potential false 
positive voxels. Using this threshold also significantly decreases the 
number of voxels the model must be fit on, allowing for a much faster 
fit. Fig. 1C shows a slice of the voxel selection mask for a single subject. 

We then fit a voxel-level logistic regression model over the candi- 
date voxels. In the OASIS model, the probability that a voxel is part of 
a lesion is represented as P{Lj(v) = 1}, where L is a random variable 
denoting voxel-level lesion presence. If there is a lesion in voxel v 
for subject i, then L,(v) = 1. Otherwise, L,(v) = 0. The probability 
that a voxel v contains lesion incidence is modeled with the following 
logistic regression model: 



(ogit[P{ii(v) = l}] = ft + 

P,FLA1I$ (y) + ftSFM/Rf (v. 10) + ftSFM/Rf (v,20) + 

ftPDf(v) + ftSPDf(v,10) + ftSPDf(v, 20) + 

ftT2f(v) + ft6T2f(v,10) + p 9 gT2?(v,20) + 

ftoTiffv) + ft,snf(v,io) + p n m?(v,20) [i]. 

+ ft 3 FLAIRj(v) * SFL4/Rf (v, 10) + PuFLAIR? (v) * QFLAIRf (v. 20) 
+ ft 5 prf(v)»<?PDf(v,10) + /3, 6 PDf(v),6PDf(v,20) 
+ 0 17 T2£(v)*gT2f(v,lO) + (3 18 T2f(v).5T2f(v,20) 

+ ft 9 nf(v)«snf(v,io) + ^nf(v)»enf(v,20) 

The effect of magnetic field inhomogeneities is thought to be multi- 
plicative, so we use the interactions between the normalized volume 
and the smoothed volume in the model. 

2.9. OASIS model refinement 

The second iteration of the OASIS model fitting is done to reduce 
the influence of lesions in the smoothed volumes. First, we fit the 
model and use the estimated coefficients to create maps of the esti- 
mated probability of lesion presence at each voxel. To incorporate 
spatial information of the neighboring voxels and reduce noise, we 
smooth the estimated probabilities from the model using a Gaussian 
kernel with window size of 3 mm. This kernel size was empirically 
chosen and found to perform well. The resulting probability maps 
were then thresholded using a liberal false positive rate of 1% (thresh- 
old value of 0.10), which resulted in model based hard segmentations 
of lesions. These lesion masks were then dilated by 5 voxels to ensure 



that the entire lesion was captured and removed from the brain tissue 
mask. Fig. IE shows the brain tissue mask with the lesions removed. 
New smoothed volumes were created by applying a Gaussian 
smoother with kernel window sizes of 10 and 20 to the normalized 
image from each modality over the brain tissue mask with the lesions 
removed. We inpainted the smoothed volumes to fill the places 
where lesions were removed with the values we would expect in 
this area if it were occupied by normal, healthy tissue. 

The intensity in voxel v of the normalized image after the second 
Gaussian smoother has been applied is labeled as, S 2 JVff(v, k). Fig. 3 
shows an axial slice for a subject of the FLAIR volume and the 
smoothed volume for this image with kernel window sizes of 10 
and 20 before and after the lesions were removed. To link the figure 
with the notation, Fig. 3A shows FiA/Rf ( v), Fig. 3B shows a scale of in- 
tensities in the smoothed volumes, Fig. 3C1 shows QFLAlR^v, 10), 
Fig. 3C2 shows g 2 FLAIR^ (v, 10), Fig. 3D1 shows gFLAIR? (v, 20), and 
Fig. 3D2 shows g 2 FLAlR^ (v. 20). The lesions are captured in the first 
smoothed volume, especially with the kernel size of 10, but are not 
captured in the second smoothed volume. The model [1] was refit 
over the same voxels using the second smoothed volume to obtain 
the final coefficients that are used to create the final probability 
maps. Again, the final estimated probabilities are smoothed using a 
Gaussian kernel with window size of 3 mm. Fig. IF shows a slice of 
the probability map for a subject and a scale of intensities. Red indi- 
cates areas with a higher probability of being a lesion and blue indi- 
cates areas with a lower probability of being a lesion. 

2.30. Probability map and binary segmentation 

Using this fitted model to generate a probability map for the entire 
brain from a set of new images takes about 30 min for each study 
using a standard workstation. The Gaussian smoothing is the slowest 
step of the algorithm and takes approximately one minute for each 
volume. These computations can be parallelized to take substantially 
less time; the entire algorithm can be run in approximately 5 min 
with 8 cores. To make a probability map for a new study, the two 
sets of regression coefficients, a brain mask, and the FLAIR, PD, 
T2 -weighted, and Tl-weighted volumes are required. Using population- 
level thresholds, the probability maps from OASIS can be used to create 
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hard segmentations of lesion presence. Fig. 1G shows a slice of a hard seg- 
mentation overlaid on the FLAIR image. A summary of how to apply the 
OASIS method to a new MRI study can be found in the Appendix. 



2.1 J. Validation with gold standard: Validation Set 3 

Validation Set 1, described in detail in the Materials and methods 
section, consists of 131 MRI studies: 98 studies from MS subjects and 
33 studies from healthy subjects. To fit the model and to measure per- 
formance, we required a set of data in which the outcome is assessed 
using a gold standard measure. The gold standard was obtained using 
manual segmentation by a technologist with more than 10 years of 
experience in delineating white matter lesions. The technologist 
spent between 30 min to an hour segmenting each study, depending 
on the lesion load and distribution. The majority of the studies 
had at least moderate pathology and therefore took between 
45 min to an hour. The segmentations were made from the FLAIR 
and Tl -weighted volumes. Fig. ID shows a manually segmented 
slice for a subject. The mean volume of lesions for MS subjects in Val- 
idation Set 1 is 11.2 cm 3 (IQR: [1.7 cm 3 , 16.6 cm 3 ]). It was assumed 
that the healthy subjects did not have any lesions. 

To evaluate the performance of our model within Validation Set 1, 
we trained the model [1] on 20 randomly selected subjects (15 MS 
subjects and 5 healthy subjects) and tested on the remaining 111 sub- 
jects (83 MS subjects and 28 healthy subjects). We used only the 
studies from the 111 subjects in this test set to estimate the 
voxel-level receiver operating characteristic (ROC) curve and area 
under the curve (AUC). These performance measures are known to 
be susceptible to instability. To account for this, we nonparametrically 
bootstrapped with replacement the subjects to the training and 



testing sets. We then fit the model on the training set and observed 
the performance of the model in the testing set. 

It is known that the full AUC summarizes test performance over 
regions of the ROC space that are not clinically relevant for lesion seg- 
mentation (Sweeney et al., 2013). Once a test has been able to distin- 
guish well between disease and not disease, the performance of the 
test for particular applications must be evaluated, in which case one 
may be interested in only a small portion of the ROC curve 
(Obuchowski, 2003). In this particular application we are interested 
in using the lesion segmentation to identify lesions and to provide ac- 
curate estimations of lesion volumes. The mean lesion volume of 
manual lesion segmentations from Validation Set 1 is 11.2 cm 3 (IQR 
[1.7 cm 3 , 16.6 cm 3 ]). For the entire brain, a false positive rate of .01 
would correspond to a volume of 12.8 cm 3 of healthy brain being 
falsely identified as lesion, which is more than the mean lesion vol- 
ume in Validation Set 1. Therefore we examined only false positive 
rates below 1%. We provide the partial ROC curve with bootstrapped 
95% confidence bands for clinically relevant false positive rates of 1% 
and below. 

2. 12. Validation with expert rankings: Validation Set 1 and Validation Set 2 

For the studies in Validation Set 2, gold standard segmentations 
were not available. To evaluate the performance of OASIS on Validation 
Set 2, three experts (a neuroradiologist, neurologist, and radiologist) 
compared OASIS segmentations to those from LesionTOADS, an open- 
source lesion segmentation software (http://www.nitrc.org/projects/ 
toads-cruise/), (Shiee et al., 2008a,b, 2010). Validation Set 2, described 
in detail in the Materials and methods section, consists of 1 69 MRI stud- 
ies of 149 subjects, 20 of whom had follow-up visits. These studies were 
acquired using a variety of imaging protocols. 
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For the OASIS algorithm, the only parameter that must be tuned 
when moving to a new dataset is the population-level threshold. 
For Validation Set 2 we used the coefficients that were trained on Val- 
idation Set 1 and then empirically adjusted the population level 
threshold for Validation Set 2. To adjust this threshold, we randomly 
sampled 10 subjects from Validation Set 2. We applied thresholds be- 
tween 0.10 and 0.50 (by increments of 0.05) to the probability maps, 
examined the segmentation, and empirically chose a threshold of 
0.35 for Validation Set 2. This threshold adjustment is very fast and 
transparent. We ran the segmentations for the 10 subjects in parallel, 
and each segmentation took less than 5 min. Next, we thresholded 
the probability maps at the 9 different thresholds, which took only 
seconds. Last, we looked through the segmentations and the original 
images to select the optimal (most reasonable) threshold, which took 
only about a minute for each subject. The entire process of tuning the 
threshold took less than an hour and involved only 10 min of manual 
image examination. This procedure only needs to be performed once 
when moving to a new imaging center or study. For the segmentation 
comparison, we presented the three experts with segmentations at 
the threshold value of 0.35 on all of the images in Validation Set 2 
as well as at the threshold from Validation Set 1 with a false positive 
rate of 0.005, a threshold value of 0.16. We will refer to the threshold 
value of 0.35 as the empirically adjusted threshold and the threshold 
value of 0.16 as the Validation Set 1 threshold. 

We compared both OASIS segmentations to the segmentations 
produced by the open source software LesionTOADS. We ran 
LesionTOADS with Tl -weighted and FLAIR inputs and the default pa- 
rameters. We adjusted the smoothing parameter from 0.2 to 0.4 be- 
cause we empirically found this to improve the quality of the 
segmentations. It is important to note that LesionTOADS not only seg- 
ments lesions, but also segments the other tissue classes of the brain. 
For this analysis, we only used the lesion segmentations. 

We designed an image rating system to evaluate the performance 
of the two segmentation algorithms. For each of the 169 studies, we 
had three segmentations: the LesionTOADS segmentation, the OASIS 
segmentation with the threshold from Validation Set 1, and the 
OASIS segmentation with the empirically adjusted threshold. We 
also randomly selected 20 of the MRI studies and created duplicates 
of these to assess rating reliability, for a total of 189 studies. We ran- 
domized the order in which the segmentations were presented to the 
experts and randomly assigned each segmentation a letter: A, B, or C, 
so as to blind the rater to the segmentation algorithm. 

We presented each of the 189 MRI studies to an experienced MS neu- 
roradiologist. For each study, the neuroradiologist examined the set of 
three segmentations along with the original FLAIR, PD, Tl -weighted, 
and T2 -weighted volumes. The neuroradiologist then scored the perfor- 
mance of each of the segmentations on a continuous scale from 0 to 100, 
with 0 being an unusable lesion segmentation and 100 being a perfect 
segmentation. The neuroradiologist was presented all three segmenta- 
tions simultaneously, so that scores were assigned relative to one anoth- 
er. Fifty of the studies were selected to be scored with the same system 
by a neurologist with a subspecialty in MS and a general radiologist in 
order to assess rater agreement among the three raters. The 50 studies 
were comprised of 45 randomly selected studies with 5 of the studies re- 
peated to assess rater reliability. 

The neuroradiologist also compared and scored the OASIS and 
LesionTOADS segmentations from the studies for the 98 MS patients 
in Validation Set 1. This allows for comparison of the performance 
of the segmentations on Validation Set 1 and Validation Set 2. 

3. Experimental results 

3.1. Validation Set 1: training with gold standard 

The OASIS model has an estimated full AUC of 98% (95% CI; [96%, 
99%]) and a partial AUC for clinically relevant false positive rates of 



1% and below of 0.59% (95% CI; [0.50%, 0.67%]) in the test set. Fig. 4 
shows the voxel-level partial ROC curve for the test set with 
bootstrapped 95% confidence bands for clinically relevant false posi- 
tive rates. The probability map threshold that corresponds to a false 
positive rate of 1% is 0.10. The vertical axis of the partial ROC curve 
shows the true positive rate (sensitivity) for thresholds between 0 
and 0.10 of the probability map and the horizontal axis shows the 
false positive rate (1 — specificity) for these thresholds. 

The coefficients from fitting the logistic model [1] over all 131 
studies in Validation Set 1, a total of 24 million voxels, are reported 
in the Appendix. The coefficients from the first and second fit of the 
model are provided. We also assessed the variation in the coefficients 
by nonparametrically bootstrapping the subjects with replacement. 
The bootstrapped 95% confidence intervals for the coefficients can 
be found in the Appendix. The variance of these coefficients is large 
in comparison to the estimates of the coefficients. The instability in 
the coefficients does not impact the performance of OASIS, as illus- 
trated in the stability of the partial ROC curve. 

Choosing a final threshold value after the second probability maps 
are made is a tradeoff between sensitivity and specificity. OASIS is 
flexible, and the appropriate false positive rate may be selected for a 
particular application. Table 2 shows the threshold values, sensitivity, 
and dice similarity coefficient (Dice, 1945) for four different false pos- 
itive rates for the model fit over all of the studies in Validation Set 1. 
OASIS detected lesions in many of the healthy subjects. Table 3 shows 
the mean volume of false positive lesions detected in the healthy and 
MS subjects for the four threshold values from Table 2. The volume of 
false positives for both the MS and healthy subjects is comparable. 

3.2. Validation Set 1: neuroradiologist rating results 

For the neuroradiologist rankings of the OASIS and LesionTOADS 
segmentations for the 98 MS subjects in Validation Set 1, we 
performed a paired t-test to assess the difference in the means of 
the OASIS segmentations and the LesionTOADS scores. This difference 
was found to be 12.6, with a 95% confidence interval of (9.6, 15.8), 
p-value < 10~ 12 . The OASIS empirical threshold was ranked higher 
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Fig. 4. Partial ROC curve for the voxel-level detection of lesions in the testing set of 
Validation Set 1 for different thresholds of the probability maps produced from 
OASIS for clinically relevant false positive rates of 1% and below. Bootstrapped 95% con- 
fidence bands are also provided. The vertical axis of the partial ROC curve shows the 
true positive rate (sensitivity) for a given threshold of the probability map and the hor- 
izontal axis shows the false positive rate (1 — specificity) for this threshold. 
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Table 2 

Binary segmentation thresholds with false positive rate, sensitivity and DSC for Valida- 
tion Set 1. 



False positive rate 


Sensitivity 


Threshold value 


DSC 


1% 


80% 


0.10 


0.55 


0.75% 


76% 


0.12 


0.58 


0.5% 


69% 


0.16 


0.61 


0.25% 


58% 


0.23 


0.59 



than LesionTOADS segmentation in 73 (95% CI: [64, 81]) of the 98 stud- 
ies or 74%(95% CI: [65%, 82%]). We nonparametrically bootstrapped 
with replacement the subjects to produce the confidence intervals for 
the rankings. 

3.3. Validation Set 2: neuroradiologist rating results 

Table 4 contains summary statistics for the scores from the neuro- 
radiologist ratings of the three segmentations for all 189 studies. The 
OASIS Validation Set 1 threshold segmentations and the LesionTOADS 
segmentations have a much lower first quantile than the OASIS em- 
pirical threshold segmentations. For this analysis we focus mainly 
on the difference between the OASIS empirical threshold and the 
LesionTOADS segmentation, as the OASIS Validation Set 1 threshold 
did not perform well on this new data set. This was expected, as the 
probability map threshold needs to be adjusted to maintain the same 
false positive rate when moving to a new data set. We performed a 
paired t-test to assess the difference in the means of the OASIS empirical 
threshold scores and the LesionTOADS scores. This difference was 
found to be 16.6, with a 95% confidence interval of (13.3, 20.0), 
p-value < 10~ 14 . The OASIS empirical threshold was ranked higher 
than LesionTOADS segmentation in 146 (95% CI:[135, 157]) of the 189 
cases or 77% (95% CI: [71%, 83%]). We nonparametrically bootstrapped 
with replacement the subjects to produce the confidence intervals for 
the rankings. 

To assess rater reliability among the 20 duplicated MRI studies, we 
calculated the intraclass correlation coefficient: 0.61 (95% CI: [0.69, 
0.81]). The rankings for the LesionTOADS images and the OASIS em- 
pirical threshold were preserved in the duplicate rankings for 17 of 
the 20 images (95% CI: [14, 20]). We nonparametrically bootstrapped 
with replacement the subjects to produce the confidence intervals for 
both the intraclass correlation coefficients and the rankings. 

3.4. Validation Set 2: rater agreement with neuroradiologist, neurologist, 
and radiologist 

Table 5 contains summary statistics for the scores from the neurora- 
diologist, neurologist, and radiologist ratings of the three segmentations 
for the set of 50 studies selected to asses rater reliability. Fig. 5 shows a 
notched box plot for each rater of these findings. From the box plot we 
see that there is a statistically significant difference between the me- 
dians for all three segmentations for the neuroradiologist and neurolo- 
gist. There was not a statistically significant difference in the medians of 
the scores for the three segmentations by the radiologist. Moreover, all 
three raters indicated that the OASIS Validation Set 1 segmentations 



Table 3 

Volume of false positive lesion in healthy volunteers and MS subjects from Validation 
Set 1 (in cm 3 ); the actual mean lesion volume is 0 cm 3 for healthy volunteers and 
11.2 cm 3 (IQR: [1.7 cm 3 , 16.6 cm 3 ]) for MS subjects. 



Threshold value 


Healthy mean (IQR) 


MS mean (IQR) 


0.10 


8.6 (4.6, 10.6) 


10.9 (7.6, 13.6) 


0.12 


6.7 (3.1, 8.2) 


8.0 (5.2, 10.3) 


0.16 


4.3 (1.5, 5.7) 


5.2 (3.0, 7.0) 


0.23 


2.2 (.7, 2.8) 


2.5 (1.2, 3.5) 



Table 4 

Summary statistics of image ratings of Validation Set 2 for neuroradiologist on 189 
studies. 





OASIS 


OASIS 


LesionTOADS 




Validation Set 1 threshold 


Empirical threshold 




Minimum 


3.7 


3.7 


2.7 


1st quantile 


27.3 


55.7 


21.7 


Median 


42.0 


68.3 


51.0 


Mean 


43.2 


64.1 


47.5 


3rd quantile 


57.7 


76.3 


71.0 


Maximum 


99.3 


99.0 


97.3 



and the LesionTOADS segmentations have a much lower first quantile 
than the OASIS empirical threshold segmentations. The outliers in the 
boxplots can be explained as either errors in processing, such as regis- 
tration or bad artifacts, or as studies that none of the segmentation 
methods performed well on. We did not remove these studies from 
the analysis, because we want to assess the performance of OASIS in 
the setting of an image processing pipeline, where images may not be 
properly registered or may contain artifacts. 

Again, we will focus mainly on the difference between the OASIS 
empirical threshold and the LesionTOADS segmentation. We performed 
a paired t-test to assess the difference in the means of the OASIS empir- 
ical threshold scores and the LesionTOADS scores. These differences can 
be found in Table 5. The mean for the OASIS empirical threshold was 
greater than the mean for the LesionTOADS scores for all three raters. 
This difference was found to be statistically significant for both the 
neuroradiologist and neurologist, (p-values< 10~ 4 and< 10~ 3 , respec- 
tively), but not for the radiologist, (p-value 0.5). The neuroradiologist 
and the neurologist tended to spread their scores more, and this 
allowed better comparison of the segmentation algorithms. Table 5 
also shows the percentage of time the OASIS empirical threshold was 
ranked higher than LesionTOADS segmentation in the 50 studies. We 
nonparametrically bootstrapped with replacement the subjects to pro- 
duce the confidence intervals for the rankings. 

To assess rater reliability among the 5 duplicated MRI studies, we 
calculated the intraclass correlation coefficient and the number of 
times the rankings for the LesionTOADS images and the OASIS empir- 
ical threshold were preserved. We nonparametrically bootstrapped 
with replacement the subjects to produce the confidence intervals 
for both the intraclass correlation coefficients and the rankings. For 
the neuroradiologist, the intraclass correlation coefficient for the 5 re- 
peated studies is 0.55 (0.21, 0.82) and the number of preserved rank- 
ings is 4 (2,5). For the neurologist, 0.32 (-0.10, 0.68) and 4 (2,5). For 
the radiologist, -0.38 (-0.35, 0.71) and 2 (0,4). The repeated rank- 
ings for each rater for the 5 subjects are reported in the Appendix. 

We calculated the rater agreement for the ranking of the OASIS 
empirical threshold versus LesionTOADS. We decided to use the rank- 
ings of the scores to assess rater agreement rather than the scores 
themselves, because, as shown from the intraclass correlation coeffi- 
cient, the scores are not very reliable, while the order in which the ob- 
servers rank the segmentations, on the other hand, is quite reliable. 
We calculated the kappa statistic to assess the reliability of the rank- 
ings for each pair of raters and nonparametrically bootstrapped with 
replacement the subjects to produce the confidence intervals for the 
kappa statistics. The kappa statistic for the rater agreement between 
the neuroradiologist and the neurologist was 0.47 (0.20, 0.75), the 
neuroradiologist and radiologist 0.02 ( — 0.26, 0.30) and the neurolo- 
gist and radiologist -0.09 (-0.37, 0.19). 

4. Discussion 

OASIS may be used to assist or even replace manual segmentation 
of MS lesions in the brain. After training and adjustment of the popu- 
lation level threshold, our fully automatic method does not require 
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Table 5 

Mean and standard deviation of the rating from the neuroradiologist, neurologist, and radiologist for OASIS Validation Set 1 threshold, OASIS empirical threshold and LesionTOADS 
on 50 studies from Validation Set 2; mean difference between OASIS empirical threshold and LesionTOADS and percentage of times OASIS was ranked higher than LesionTOADS on 
these images. 



OASIS 



OASIS 



Validation Set 1 



Mean (SD) 



Empirical 
Mean (SD) 



LesionTOADS 
Mean (SD) 



Mean 



Difference 



(95% CI) 



Percentage 
Rank 
(95% CI) 



Neuroradiologist 

Neurologist 

Radiologist 



46.3 (22.0) 
48.7 (24.3) 
71.6 (19.6) 



66.1 (20.2) 
73.1 (18.5) 
74.1 (17.9) 



47.3 (27.2) 
56.6 (26.0) 
71.8 (16.5) 



18.7 (11.2, 26.3) 
16.5 (7.0, 25.9) 
2.3 (-4.2, 8.8) 



76% (64%, 88%) 
66% (52%, 78%) 
52% (38%, 66%) 



human input and avoids the variability introduced by manual seg- 
mentation. Using the explicit form of the statistical model, OASIS 
can easily be adapted and trained for cases where more or fewer im- 
aging sequences are available. 

With the OASIS model, a recalibration of the population-level seg- 
mentation threshold is necessary for each new data set but can be 
done on a fairly limited number of subjects, as in the example from 
this paper. A recalibration of the population-level segmentation 
threshold is necessaiy for each new data set but can be done on a fair- 
ly limited number of subjects, as in the example from this paper. A set 
of subjects is required to tune this population level threshold, there- 
fore fully automatic segmentation of a single study from a new imag- 
ing center may not be feasible with the OASIS model. However, in 
these cases the threshold can be adjusted very quickly manually 
(2-5 min) by visual inspection of 3-4 slices by adjusting just one pa- 
rameter. When using an ROC curve for classification, thresholds for 
subpopulations with different covariate values may need to be de- 
fined differently in order to keep false positive rates the same across 
those subpopulations (Pepe, 2003). Therefore, it was expected that 
the ROC threshold would need to be adjusted to maintain the same 
false positive rate from Validation Set 1 in Validation Set 2. This 
threshold is the only tuning parameter in OASIS that must be adjusted 
when moving to a new data set, and this adjustment is very fast and 
intuitive to make and does not require multiple iterations of segmen- 
tations. We believe that OASIS holds promise for use in multicenter 
MRI studies, with adjustment of the population level threshold for 
each site. 

Future work includes further validation of OASIS under changes in 
imaging center and protocol and to also show the reproducibility of 
the OASIS segmentations. One resource for this is the MS Lesion Seg- 
mentation Challenge (Styner et al., 2008), a common database for MS 
lesion segmentation algorithms. We plan to do further validation 
with this database as well as with volumes from additional imaging 
centers. For this analysis we did not have scan-rescan MRI available. 



These are crucial for assessing the reproducibility of the method, 
and we plan to acquire these in the future. 

In contrast to many automatic segmentation techniques, OASIS is 
computationally fast. While training the model on the 131 studies 
from Validation Set 1 takes five hours on a standard workstation, 
this process is only conducted once. The results from this are summa- 
rized as the two sets of 21 coefficients in model [1]. Also, the model 
may be trained on fewer studies, as shown in the partial ROC analysis 
within Validation Set 1 ; the performance of the model remains stable 
when trained on subsets of 20 studies. Using this fitted model to gen- 
erate a probability map of the entire brain from a set of new images 
takes only 30 min. These times are for standard workstations and 
are expected to drop dramatically with multi-core parallel computing 
and improved technologies. The Gaussian smoothing is the slowest 
step of the algorithm, and these computations can be parallelized to 
substantially decrease the time of the entire algorithm to approxi- 
mately 5 min. 

After making the image ratings for Validation Set 2, the neuroradiol- 
ogist was unblinded and reviewed the three segmentations, providing 
comments about the strengths and weaknesses of each. The OASIS em- 
pirical threshold performed much better than the OASIS Validation Set 1 
threshold. The neuroradiologist reported a preference for the smooth- 
ness of the OASIS segmentations in contrast to the LesionTOADS seg- 
mentation, which often appeared speckled. The OASIS segmentations 
often had artifacts in the pineal glands and the choroid plexus of the 
ventricles. This may be explained by the fact that OASIS was trained 
on FLAIR images acquired before a gadolinium-based contrast agent 
was administered to the patient, while the validation was done with 
FLAIR images that were acquired after gadolinium administration. 
Voxels in the choroid plexus and pineal glands, which enhance with 
gadolinium, were brighter and were thus misclassified as lesion. 
LesionTOADS does not make a similar error, as it imposes topological 
constraints that preclude these structures from being identified as le- 
sions. Further refinements of OASIS may account for such complex 
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Fig. 5. Notched box plot of the results from the neuroradiologist, neurologist, and radiologist image ratings for segmentations of the 50 MRI studies from Validation Set 2: the OASIS 
Validation Set 1 threshold segmentations, the OASIS empirically adjusted threshold segmentations, and the LesionTOADS segmentations. 
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changes of protocol. The LesionTOADS segmentations were more vari- 
able than those of OASIS and did not perform well on cases with low le- 
sion load. The OASIS segmentation had systematic errors in the medial 
frontal cortex and the brainstem. On the other hand, LesionTOADS 
avoided false positives in the brainstem because it only segments le- 
sions in the cerebrum. Fig. 6 shows a slice from a subject with an exam- 
ple of a lesion that OASIS segments in the cerebellum. Fig. 6A shows a 
single slice of the FLAIR volume, Fig. 6B shows a single slice of the 
Tl -weighted volume, Fig. 6C shows the LesionTOADS segmentation of 
the slice, and Fig. 6D shows the OASIS segmentation of the slice. 
LesionTOADS does not segment the cerebellum, whereas OASIS does 
not restrict the areas that it segments and is able to find the lesion in 
this slice. 

OASIS is not an atlas-based method and therefore does not take 
into account anatomical information during segmentation, such as 
tissue class. Further incorporation of anatomical information, such 
as the tissue class segmentations from LesionTOADS, may help to 
avoid lesions false positives in areas where we have prior knowledge 
that lesion presence is low and where OASIS made systematic false 
positives, such as the medial frontal cortex and the brainstem. Also, 
this could be used to help with the false positives in the pineal glands 



and the choroid plexus of the ventricles in the post-contrast FLAIR as 
these are areas where lesions do not occur in MS. 

The smoothed images used in OASIS are similar to the use of 
smoothed images for inhomogeneity correction in MRI. For inhomoge- 
neity correction, an image is smoothed to suppress the details of the 
image and then the original image is divided by this smoothed image 
in order to correct the image inhomogeneity (Axel et al., 1987). Our 
method differs from this in that we do not divide the original image 
by the smoothed volume. Instead we use the smoothed volume as a co- 
variate in our model. We also use multiresolution smoothed volumes, in 
contrast to just one smoothed volume for correction. 

Other methods of capturing inhomogeneities may be used in the 
OASIS model as an alternative to the smoothed volumes. Alternative 
smoothers may be used instead of the Gaussian kernel and may be 
more appropriate in other applications. We decided to use the Gauss- 
ian filter because it is widely used, can be applied to any image, and is 
relatively computationally fast. The OASIS modeling framework is 
very flexible, however, and can be adapted for other methods of cap- 
turing the bias field and regional intensity variation. 

We used the 15th percentile of FLAIR intensities in the brain to 
create the brain tissue mask. Other segmentations can be used to 





Fig. 6. Example of a cerebellum lesion classified using OASIS in Validation Set 2: A. FLAIR volume; B. Tl -weighted volume; C. LesionTOADS segmentation; and D. OASIS empirically 
adjusted threshold segmentation. 
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remove the CSF. We used the 15th percentile of FLAIR intensities be- 
cause it is fast and performed well in this application. 

Lesions that are hypointense on FLAIR, because of high free water 
content, are not detected by OASIS. The method models only candidate 
voxels, the top 15% of voxels in the cerebral matter-masked FLAIR vol- 
ume, to minimize the number of false positives. In the FLAIR volume, 
such lesions are characterized by hypointensities in the center of a le- 
sion and hyperintensities around the edges. Therefore the center of 
the lesions is excluded from the candidate voxels. Future work includes 
expanding the OASIS model to segment these lesions. This could be 
done by fitting another OASIS model trained only on lesion voxels 
that appear hypointense in FLAIR lesions. The binary segmentations 
from the original OASIS model and this model could then be combined 
to produce a complete lesion segmentation. 

Like other voxel-based methods, OASIS is sensitive to major mis- 
registrations within an MRI study. However, in part because it incor- 
porates spatial smoothing, OASIS is not sensitive to minor errors in 
registration. By simultaneously comparing data from multiple se- 
quences and only considering candidate voxels, OASIS is able to dis- 
tinguish between artifacts and lesion. 

OASIS uses a voxel-level model for assessing the outcome. The as- 
sumption of independence between voxels is imperfect, as lesions 
consist of clusters of voxels. In this work we use smoothing in the 
smoothed volumes and smoothing of the predicted probabilities of 
the model to incorporate the spatial nature of the data. Nevertheless, 
further incorporation of neighboring voxel information is warranted. 
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